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PROACTIVE PREDICTIVE PREVENTATIVE NETWORK MANAGEMENT 

TECHNIQUE 

Technical Field 

[000 1 ] This invention relates to a technique for monitoring a data network to provide 
an indication of when a failure may occur. 

Background Art 

[0002] In recent years, the needs of large telecommunications subscribers have evolved 
from a demand for conventional long distance service to a need for high-bandwidth data 
transmission capability. As the sophistication of large telecommunications subscribers 
has advanced, so has their capability to monitor the quality of services they receive. 
Many large subscribers of data communications services now have the ability to detect 
deviations in the quality of service they receive, often in advance of any detection by the 
carrier providing the service. 

[0003] Present-day performance monitoring systems employed by telecommunications 
carriers typically operate by providing an alarm indication when a particular condition 
(attribute) exhibited by a network element crosses an alarm threshold. Such systems do 
not necessarily provide the most practical solution to the problem of monitoring network 
performance. In practice, setting alarm thresholds to a low setting to track trouble 
signatures will yield a large number of alarms, often overwhelming network technicians. 
On the other hand, setting alarm thresholds to a relatively high setting will prevent 
detection of a network element undergoing a gradual failure. 
[0004] Thus, there is a need for a network monitoring technique that affords a 
telecommunications carrier the ability to track the performance of its network by 
detecting the gradual performance degradation of networks elements over time. 
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Brief Summary of the Invention 

[0005] Briefly, in accordance with a preferred embodiment, there is provided a method 
for maintaining the performance of a network, and more particularly, a data 
communications network, that includes at least one element, such as a router or switch for 
example. In accordance with the method, at least one attribute of the element is 
monitored periodically (e.g., hourly, daily or weekly). The monitored attribute is 
compared to a corresponding threshold value. Such monitoring and comparison yields an 
historic performance trend for the element from which a determination can be made 
whether the there is at least one crucial attribute of the element that warrants closer 
monitoring. If the closer monitoring is warranted, then the element is monitored in near 
real time, say every ten minutes, to determine whether a persistent performance 
degradation exists. If so, then the network element is altered, either by repair or 
replacement, to ameliorate the performance degradation. The foregoing method enables 
a network operator to better isolate those network elements that exhibit degraded 
performance, thus affording the network operator the ability to fix the trouble before the 
subscriber becomes aware of the problem. 

Brief Description of the Drawings 

[0006] FIGURE 1 depicts a block schematic diagram of communications network 
monitored in accordance with the teachings of the prior art; and 
[0007] FIGURE 2 depicts the communications network of FIG. 1 as monitored in 
accordance with the teachings of the invention. 

Detailed Description 

[0008] FIGURE 1 depicts a communications network 10 comprised of a plurality of 
network elements (e.g., routers) 1 1 i-l \ m (where m is an integer) interconnected by links 
12i-12„ (where n is an integer). The network 10 communicates traffic (i.e., data packets) 
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between two or more hosts, exemplified by hosts 13 and 14. A first router 16 links the 
host 13 to a first Local Exchange Carrier (LEC) 18. A first Backbone-to- Horizontal 
Cross-connect (BHC) 22 connects the LEC 1 8 to router 1 1 1 within the network 10. The 
router 1 li is "homed" to the host 13 and serves as the ingress/egress router for that host. 
A second router 22 links the host 14 to a second Local Exchange Carrier (LEC) 24. A 
second Backbone-to Horizontal Cross-connect (BHC) 26 connects the LEC 24 to router 
1 1 2 . The router 1 1 2 is "homed" to the host 14 and serves as the ingress/egress router for 
that host 

[0008] In practice, the routers 1 6 and 22 that ultimately link the hosts 1 3 and 14 to the 
routers 1 h and 1 h, respectively, each have real-time communications and link 
performance monitoring ability. Using this capability, each of the hosts 12 and 14 can 
detect whether the Quality-of Service (QoS) provided by the network 10 meets an agreed 
quality level, usually governed by a Service Level Agreement (SLA) between the host 
and the operator of the network 10. In an effort to provide the agreed-to QoS, the 
operator of the network 10 typically employs a fault management system 30 that 
monitors alarms generated by each of the elements 1 1 i-l \ m as well as the BHCs 20 and 
26 on a periodic basis. Such received alarm signals are filtered and correlated. Any 
alarm condition that deviates from a threshold value will cause the fault management 
system 30 to generate a notification to one or more network technicians. A separate 
monitoring system 32 monitors the performance of the links 12 r 12„ by periodically 
receiving alarm signals associated with one or more link attributes and filtering and 
correlating such signals to determine if any attribute associated with a particular link, 
deviates from an associated threshold value. 

[0009] Typical present-day fault management systems, such as the systems 30 and 32, 
often suffer from an inability to proactively predict gradually degraded performance on a 
monitored element to allow the network operator to take timely action to prevent 
diminution in the QoS afforded each of the hosts 13 and 14. Setting each alarm limit 
relatively low to track the 'signature" of a monitored generates a large number of 
notifications that often overwhelm the network technicians. Conversely, setting each 
alarm relatively high to avoid the problem of overwhelming network technicians incurs 
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the difficulty of detecting gradual performance degradation of one or more network 
elements. 

[0010] FIG. 2 shows a network 10, like the network 10 of FIG. 1, for communicating 
traffic (i.e., data packets) between two or more hosts, exemplified by hosts 13 and 14. 
Like reference numerals have been used in FIG. 2 as in FIG. 1 to designate like elements. 
In accordance with present principles, the network 10 incorporates a next generation 
performance management system 30' for providing proactive predictive preventative 
network management. The management system 30' of FIG. 2 has three functional 
components represented by elements 34a, 34b, and 34c although, as may become better 
understood hereinafter, a single module could perform the functions individually 
performed by each of the components. Component 34a monitors various network 
elements (e.g., routers 1 1 i-l 1„, links 12 r 12 m and the BHCs 20 and 26) as well as the host 
routers 16 and 22 and the links connecting the BHCs to the ingress routers of the network 
10, to establish an historical trend for each such monitored element. 
[001 1] To establish an historic trend for each monitored element, the component 34a 
within the performance management system 30' periodically acquires the value of one or 
more attributes of each monitored element, on a weekly, daily or even hourly basis. The 
component 34a then filters and correlates the attribute values to determine which attribute 
exceeds an associated prescribed threshold. For example, the component 34a of the 
performance management system 30' may establish an historical trend by creating a 
histogram of the attribute values for each monitored element that exceed the associated 
threshold. 

[0012] The information indicative of the historical performance trends of network 
elements monitored by the component 34a passes to a component 34b that serves to 
monitor, in near real time, critical performance attributes of elements exhibiting 
performance degradations, as identified by historical performance monitoring. For 
example, if component 34a determines that a monitored element, say router 1 1 1 persists 
in its performance degradation, the second component 34b within the performance 
management system 30' begins near real-time monitoring of that element. In particular, 
the component 34b commences near real time monitoring of critical performance 
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attributes by detecting their value during much shorter intervals, say every ten minutes, as 
compared to the performance monitoring interval of the component 34a. By monitoring 
at least the critical performance attributes of those network elements experiencing 
persistent performance degradation, the component 34b assures that such network 
elements receive greater scrutiny than would normally occur with conventional 
performance monitoring systems. Thus, the monitoring undertaken provided by the 
performance monitoring system 30' of the invention eliminates the problem of having an 
overwhelming number of alarm conditions, but still affords the opportunity to detect 
gradual performance degradation. In addition to performing near real-time monitoring of 
crucial attributes, the component 34b may also perform near real-time monitoring of 
other attributes as well. 

[0013] If performance degradations persist in one or more monitored elements, thus 
revealing "hot spots" associated with the network 10, the network operator can make 
alterations by repairing or replacing elements at the source of the problem. In this way, 
the network elements exhibiting degraded performance are identified first through 
tracking historical trends and then through the tracking trouble signatures detected by 
frequent monitoring. Performance alarms may also provide information in addition to, or 
in place of such trouble signatures, although such performance alarms aren't always 
supported by certain network elements because of the load placed on such elements by 
the overhead imposed by performance alarms. 

[0014] In addition to the component 34b that performs near-real time monitoring of the 
network elements, the performance monitoring system 30' may also include component 
34c that performs real-time monitoring of those particular elements that exhibit degraded 
performance (i.e., the hot spots discussed previously). To that end, the component 34c 
monitors such elements exhibiting degraded performance virtually instantaneously 
(during very short intervals, much shorter than the monitoring interval of component 
34b), thus permitting the network operator to alter a network element by repair or 
replacement to ameliorate a diminution of the quality of service. In this way, the network 
operator can fix a trouble before a subscriber becomes aware of the problem. 
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[00 1 5] The process by which the performance management system 30' determines 
which if any network element exhibits persistent performance degradation not only 
depends on collecting meaningful information, but also depends on a knowledge of the 
failure mode of each monitored element. Thus the performance management process 
performed by the system 30' depends on knowing the various ways a monitored element 
may fail, and the particular attributes necessary for monitoring to detect such a potential 
failure. Accordingly, the system 30' typically includes a fault model for each monitored 
element. Such a fault model not only provides an indication of how various monitored 
attributes change as performance degradation progresses, but may also provide an 
indication of additional attributes that may require monitoring upon detecting a 
performance degradation. 

[0016] The foregoing describes a technique for accomplishing proactive predictive 
preventative network management. 

[0017] The above-described embodiments merely illustrate the principles of the 
invention. Those skilled in the art may make various modifications and changes that will 
embody the principles of the invention and fall within the spirit and scope thereof. 
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